sf packageIn this mini-workshop we will introduce the sf package, show some examples of geospatial analysis, work with base plotting of sf objects, and show how mapview can be used to map these objects. It is assumed that you have R and RStudio installed and that you, at a minimum, understand the basic concepts of the R language (e.g. you can work throughR For Cats).
Also as an aside, I am learning the sf package right now so, we will be learning all of this together!
sf packageThings are changing quickly in the R/spatial analysis world and the most fundamental change is via the sf package. This package aims to replace sp, rgdal, and rgeos. There are a lot of reasons why this is a good thing, but that is a bit beyond the scope of this workshop Suffice it to say it should make things faster and simpler!
To get started, lets get `sf installed:
install.packages("sf")
library("sf")
It does rely on having access the GDAL, GEOS, and Proj.4 libraries. On Windows and Mac this should be pretty straightforward.
The first exercise won’t be too thrilling, but we need to make sure everyone has the packages installed.
sf.sf.dplyr already, make sure it is installed.dplyr.sfSo, what does sf actually provide us? It is an implementation of an ISO standard for storing spatial data. It forms the basis for many of the common vector data models and is centered on the concept of a “feature”. Essentially a feature is any object in the real world. There are many different types of features and there are different details that get stored about each. For details on this the first sf vignette does a really nice job. For this mini-workshop we are going to focus on three feature types, POINT, LINESTRING, and POLYGON. For each of the types, there will be coordinates stored as dimensions, a coordinate reference system, and attributes.
We can grab some data directly from the Rhode Island Geographic Information System (RIGIS) for these examples. This code assumes you have a data folder in your current workspace.
# Municipal Boundaries
download.file(url = "http://www.rigis.org/geodata/bnd/muni97d.zip",
destfile = "data/muni97d.zip")
unzip(zipfile = "data/muni97d.zip",
exdir = "data")
# Streams
download.file(url = "http://www.rigis.org/geodata/hydro/streams.zip",
destfile = "data/streams.zip")
unzip(zipfile = "data/streams.zip",
exdir = "data")
# Potential Growth Centers
download.file(url = "http://www.rigis.org/geodata/plan/growth06.zip",
destfile = "data/growth06.zip")
unzip(zipfile = "data/growth06.zip",
exdir = "data")
# Land Use/Land Cover
download.file(url = "http://www.rigis.org/geodata/plan/rilc11d.zip",
destfile = "data/rilc11d.zip")
unzip(zipfile = "data/rilc11d.zip",
exdir = "data")
To pull the shapefiles in we can simply use the st_read() function. This will create an object which is a simple feature collection of, in our case, POINT, LINESTRING, or POLYGON. As an aside, many of the sf functions and all of the ones we will be using start with st_. This stands for “spatial” and “temporal”. Take a look below for examples reading in each of our datasets.
growth_cent <- st_read("data/growth06.shp")
## Reading layer `growth06' from data source `/data/jhollist/geospatial_with_sf/data/growth06.shp' using driver `ESRI Shapefile'
## Simple feature collection with 21 features and 2 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 260137.3 ymin: 32916.7 xmax: 418116.3 ymax: 326549.2
## epsg (SRID): NA
## proj4string: +proj=tmerc +lat_0=41.08333333333334 +lon_0=-71.5 +k=0.99999375 +x_0=99999.99999999999 +y_0=0 +datum=NAD83 +units=us-ft +no_defs
streams <- st_read("data/streams.shp")
## Reading layer `streams' from data source `/data/jhollist/geospatial_with_sf/data/streams.shp' using driver `ESRI Shapefile'
## Simple feature collection with 4470 features and 8 fields
## geometry type: LINESTRING
## dimension: XY
## bbox: xmin: 234010.1 ymin: 31361.37 xmax: 430921.9 ymax: 340865.8
## epsg (SRID): NA
## proj4string: +proj=tmerc +lat_0=41.08333333333334 +lon_0=-71.5 +k=0.99999375 +x_0=99999.99999999999 +y_0=0 +datum=NAD83 +units=us-ft +no_defs
muni <- st_read("data/muni97d.shp")
## Reading layer `muni97d' from data source `/data/jhollist/geospatial_with_sf/data/muni97d.shp' using driver `ESRI Shapefile'
## Simple feature collection with 396 features and 12 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 220310.4 ymin: 23048.49 xmax: 432040.9 ymax: 340916.6
## epsg (SRID): NA
## proj4string: +proj=tmerc +lat_0=41.08333333333334 +lon_0=-71.5 +k=0.99999375 +x_0=99999.99999999999 +y_0=0 +datum=NAD83 +units=us-ft +no_defs
lulc <- st_read("data/rilc11d.shp")
## Reading layer `rilc11d' from data source `/data/jhollist/geospatial_with_sf/data/rilc11d.shp' using driver `ESRI Shapefile'
## Simple feature collection with 68186 features and 5 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 218380.4 ymin: 23048.5 xmax: 434592 ymax: 343508
## epsg (SRID): NA
## proj4string: +proj=tmerc +lat_0=41.08333333333334 +lon_0=-71.5 +k=0.99999375 +x_0=99999.99999999999 +y_0=0 +datum=NAD83 +units=us-ft +no_defs
One of the benefits of using sf is the speed. In my tests it is about twice as fast as the prior standard of sp and rgdal. Let’s look at a biggish shape file with 1 million points!
1 million points
#The old way
system.time(rgdal::readOGR("data","big"))
## OGR data source with driver: ESRI Shapefile
## Source: "data", layer: "big"
## with 1000000 features
## It has 1 fields
## user system elapsed
## 10.758 0.138 10.909
#The sf way
system.time(st_read("data/big.shp"))
## Reading layer `big' from data source `/data/jhollist/geospatial_with_sf/data/big.shp' using driver `ESRI Shapefile'
## Simple feature collection with 1000000 features and 1 field
## geometry type: POINT
## dimension: XY
## bbox: xmin: -71.03768 ymin: 41.05976 xmax: -69.09763 ymax: 43.00856
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## user system elapsed
## 5.244 0.097 5.343
sf objectsOne of the nicest features (pun intended) about sf objects is that they are nothing more than data frames. The data for each feature (e.g. the attributes in ESRI speak) are stored in the data frames first columns. The last column of the data frame is a “geometry” column which holds the coordinates, and coordinate reference system information. I say this is nice becuase we don’t need to completely learn a new way of working with spatial data. Much of what we now about working with plain old tabular data frames will also work with sf objects.
We can use many of our base R skills directly with these objects.
head(muni)
## Simple feature collection with 6 features and 12 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: 246748.8 ymin: 282524.2 xmax: 360358 ymax: 340916.6
## epsg (SRID): NA
## proj4string: +proj=tmerc +lat_0=41.08333333333334 +lon_0=-71.5 +k=0.99999375 +x_0=99999.99999999999 +y_0=0 +datum=NAD83 +units=us-ft +no_defs
## AREA PERIMETER RITOWN5K_ RITOWN5K_I NAME MCD CFIPS
## 1 788769737 134174.19 2 380 CUMBERLAND 20 7
## 2 219887686 61203.98 3 1 WOONSOCKET 80 7
## 3 693661121 122732.51 4 2 NORTH SMITHFIELD 55 7
## 4 1588026224 166984.85 5 3 BURRILLVILLE 5 7
## 5 527562041 117805.34 6 381 LINCOLN 45 7
## 6 769680403 111353.48 7 4 SMITHFIELD 75 7
## COUNTY OSP CFIPS_MCD TWNCODE LAND geometry
## 1 PROVIDENCE 8 7020 CU Y POLYGON((339469.65625 34051...
## 2 PROVIDENCE 39 7080 WO Y POLYGON((320285.09375 33947...
## 3 PROVIDENCE 25 7055 NS Y POLYGON((299086.65625 33886...
## 4 PROVIDENCE 3 7005 BU Y POLYGON((303493.28125 31006...
## 5 PROVIDENCE 17 7045 LI Y POLYGON((331461.675 320529....
## 6 PROVIDENCE 31 7075 SM Y POLYGON((303493.28125 31006...
streams$NAME[1:10]
## [1] Mill River No Name Round Top Brook Round Top Brook
## [5] Peters River Indian Brook Chockalog River Chockalog River
## [9] Chockalog River Chockalog River
## 242 Levels: Abbott Run Brook Acid Factory Brook ... Woonasquatucket River
str(growth_cent)
## Classes 'sf' and 'data.frame': 21 obs. of 3 variables:
## $ NAME : Factor w/ 21 levels "BLOCK ISLAND",..: 8 5 14 9 21 4 3 17 13 20 ...
## $ Cntr_Class: Factor w/ 2 levels "Town","Village": 2 2 2 2 2 2 2 2 2 2 ...
## $ geometry : List of 21 , printing Classes 'XY', 'POINT', 'sfg' num [1:2] 279907 321759
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA
## ..- attr(*, "names")= chr "NAME" "Cntr_Class"
sf objects with dplyr, yes, dplyr!However, what is truly awesome, is the ability for us to now use dplyr to manipulate our spatial data and pipe our spatial workflows. First let’s make sure dplyr is loaded up.
library("dplyr")
Now we can work with our spatial data. Let’s filter out the towns that are more than
big_muni <- muni %>%
mutate(area_skm = AREA*.000000092903) %>%
group_by(NAME) %>%
summarize(sum_area_skm = sum(area_skm)) %>%
filter(sum_area_skm >= 100)
big_muni
## Simple feature collection with 11 features and 2 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: 220310.4 ymin: 23048.49 xmax: 432040.9 ymax: 340916.6
## epsg (SRID): NA
## proj4string: +proj=tmerc +lat_0=41.08333333333334 +lon_0=-71.5 +k=0.99999375 +x_0=99999.99999999999 +y_0=0 +datum=NAD83 +units=us-ft +no_defs
## # A tibble: 11 × 3
## NAME sum_area_skm geometry
## <fctr> <dbl> <simple_feature>
## 1 BURRILLVILLE 147.5324 <POLYGON((303...>
## 2 COVENTRY 161.7605 <POLYGON((315...>
## 3 EXETER 151.2190 <POLYGON((315...>
## 4 FOSTER 134.6021 <POLYGON((276...>
## 5 GLOCESTER 147.1955 <POLYGON((303...>
## 6 HOPKINTON 114.3237 <POLYGON((248...>
## 7 NORTH KINGSTOWN 114.1386 <MULTIPOLYGON...>
## 8 RICHMOND 105.5525 <POLYGON((297...>
## 9 SCITUATE 141.9332 <POLYGON((276...>
## 10 SOUTH KINGSTOWN 155.9371 <MULTIPOLYGON...>
## 11 WEST GREENWICH 132.6521 <POLYGON((315...>
And first time I tried this, this happened:
mind_blown
Let’s try the same thing with our land use/land cover data, except without the filter. Use dplyr and
There are default plotting methods for these sf objects, so we can use our base plotting tools to create some quick maps.
plot(muni\(geometry, col = muni\)COUNTY)